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RELATIVE-ENTROPY  MINIMIZATION  WITH  UNCERTAIN  CONSTRAINTS  - 
THEORY  AND  APPLICATION  TO  SPECTRUM  ANALYSIS 

I.  INTRODUCTION' 


The  relative-entropy  principle  (REP)  is  a  general,  information-theoretic 
method  for  inference  when  information  about  an  unknown  probability  density  gf 
consists  of  an  initial  estimate  p  and  additional  constraint  information  that  res¬ 
tricts  qr  to  a  specified  convex  set  of  probability  densities.  Typically  the  con¬ 
straint  information  consists  of  Unear-equalLty  constraints — expected  values 

fr  =  f/r{x)qr{x)dx  (1) 

for  known  f,-{x )  and  fr  r=0l,  .  ,  M.  The  principle  states  that  one  should 

choose  the  final  estimate  q  that  satisfies 

M{q,p)  =  min  H{q\p), 

t 

where  H  is  the  relative  entropy  (cross  entropy,  discrimination  information, 
directed  divergence,  I-divergence,  K-L  number,  etc.), 

H{q.p)  -  J  q{z)\  ogZ£L±x,  (2) 

and  where  q'  varies  over  the  set  of  densities  that  satisfy  the  constraints.  When 
these  are  linear-equality  constraints  (l),  the  final  estimate  has  the  form 

q(x)  -p(x)  exp  [-a -£&/»■(*)]  ,  (3) 

where  the  ft.  and  a  are  Lagrangian  multipliers  determined  by  (l)  (with  qr 
replaced  by  q )  and  by  the  normalization  constraint 

fq(x)dx  =  1.  (4) 


Properties  of  REP  solutions  and  conditions  for  their  existence  are  discussed  in 
[1,2].  Expressed  in  terms  of  the  expected  values  and  the  Lagrangian  multi¬ 
pliers,  the  relative  entropy  at  the  minimum  is  given  by 


H{q,p)  =  -a  -  £ft./r. 

r 


(5) 


The  normalization  multiplier  a  is  given  by 

a  =  log  fp{x)  exp  |-£X/r(*)J  (6) 


The  quantity  Z  =ea  is  often  referred  to  as  the  partition  function.  If  the  partition 
function  can  be  evaluated  analytically — i.e.,  if  the  integral  in  (6)  can  be 
performed — the  relations 


(?) 
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c_an  sometimes  be  solved  to  express  the  3r  as  functions  of  the  expected  values 
fr.  If  not,  various  computational  methods  can  be  used  to  find  the  vaiues  for  the 
a  and  fir  m  (3)  that  satisfy  ( '.)  and  [-)  [3],  As  a  general  method  of  statistical 
inference,  the  REP  was  first  introduced  by  Kullback  _"4],  has  been  advocated  in 
various  forms  by  others  5.  6,  7],  and  has  been  applied  m  a  variety  of  fields  :for  a 
list  of  references,  see  (3]). 

Informally  speaking,  of  the  densities  that  satisfy  the  constraints,  the  REP 
selects  the  one  that  is  closest  to  p  in  the  sense  measured  by  relative  entropy 
In  more  formal  terms,  the  REP  can  be  justified  on  the  basis  of  the  information- 
theoretic  properties  of  relative  entropy  ’_•?],  or  on  the  basis  of  consistency 
axioms  for  logical  inference  (8].  In  applications  of  the  REP,  the  known  expected 
values  fT  in  (1)  frequently  correspond  to  physical  measurements.  Such  meas¬ 
urements  usually  are  subject  to  error  so  that  strict  equality  in  (1)  is  unrealistic 

In  the  next  section  we  discuss  the  REP  with  ‘uncertain  constraints."  a  form 
of  the  principle  appropriate  for  applications  with  uncertainty  m  the  expected 
values.  In  the  third  section,  relative-entropy  minimization  with  'uncertain  con¬ 
straints  is  applied  to  spectrum  analysis;  a  relative-entropy  spectrum  estimate 
from  uncertain  autocorrelations  is  derived  The  fourth  and  fifth  sections  are 
devoted  to  a  numerical  example  and  a  concluding  discussion,  respectively 


II.  RELATIVE-ENTROPY  MINIMIZATION  WITH  UNCERTAIN  CONSTRAINTS 


In  this  Section  we  extend  the  results  on  the  REP  with  hnear-e quality  con¬ 
straints  to  incorporate  uncertainty  about  the  values  of  the  fr  m  (i).  We  define 
an  error  vector  v  with  components 

vr  =  J f  r{x)q*{z)  dx  —fr  (8) 

A  simple  generalization  would  be  to  replace  the  set  of  constraints  (l)  with  a 
bound  on  the  magnitude  of  v : 

E  [f f  r\x)qr{x)  dx  -/r)  £  e2  (9) 

However,  all  components  vr  may  not  have  equal  uncertainty,  and  different  com¬ 
ponents  may  be  correlated.  We  therefore  replace  (9)  with  the  more  general  con¬ 
straint 

EMr,VrV,  5  S2  (10) 

fit 

In  matrix  notation  this  is 

v-iiv  is  s2,  (::) 

where  M  is  any  positive-definite  matrix. 

We  assume  that  we  are  given  an  initial  estimate  p  of  qr,  measured  values  fr 
of  the  expectations  (1)  of  functions  fr  for  a  finite  set  of  indices  r,  and  an  error 
estimate  s.  We  will  first  derive  the  form  of  the  final  estimate  q  under  the 
assumption  that  the  constraint  has  the  form  (9)  and  that  the  fr  are  0;  that  is, 
we  assume  a  constraint 

E  [ffr{x)q'{x)dxT<zz  (12) 

Next  we  show  how  to  reduce  the  more  general  constraint  (10)  to  this  case  We 

conclude  this  section  ’with  a  remark  on  the  relation  between  the  result  with  e  >  0 
and  that  for  "exact  constraints''  (e  =  0). 


Our  problem  .3  to  minimize  the  relative  entropy  H[q  p)  subject  to  the  con¬ 
straint  :  12)  .with  7  .n  piace  of  q~)  and  the  normalization  constraint  (4)  If  the 
.mtiai  estimate  satisfies  the  constraint  (i.e  ;12)  holds  with  p  m  place  of  q). 
then  setting  q  =  p  gives  the  minimum  Otherwise  equality  holds  m  (1 2),  and  the 
criterion  for  a  minimum  is  that  the  variation  of 


JV 


■og 


{ffr\z)q\s)±s^  *■  la  -  :'fq{x)dx 


p'yx 


t  • 


with  respect  to  q{x)  is  zero  for  some  Lagrange  multipliers  X  >  0,  corresponding 
to  (12).  and  a  -  1,  corresponding  to  (4)-  (We  write  a  -  1  instead  of  a  for  later 
convenience.)  With  \  >  0.  the  criterion  intuitively  implies  that  a  small  change  5q 
m  q  that  leaves  fq{x)dx  fixed  and  decreases  H[q.p)  must  increase  the  error 

term  £  lffr{x)q[x)dx] 

r 

Equating  the  variation  of  (13)  to  zero  gives 

L°g  +•  a  +  *£2/r(x) f fr{x')q{x')dx‘  =  0. 

P\X  )  T 


Therefore  q  satisfies 

q[x)  =  p(x)  exp  |-  a  -  £ft./r(x)j  (14) 

where 

fir  =  Zkffr{x)q{x)dx.  (15) 

Conversely,  if  q  has  the  form  (14),  and  if  a,  X.  and  the  ft.  are  chosen  so  that  (15), 
the  constraint  (12),  and  the  normalization  condition  (4)  hold,  then  q  is  a  solution 
to  the  minimization  problem.  But  if  (15)  holds,  the  constraint  with  equality  is 
equivalent  to 


or  to 


X  = 


where  we  have  written  \\p  \  for  the  Euclidean  norm  (£r/S2)^. 
and  ft.  m  (14)  so  that  (4)  and 


Thus  if  we  choose  a 


e— j—rj-  =  ffr{x)q{x)dx  (16) 

hold,  then  the  constraint  (12)  wiil  be  satisfied,  and  we  can  ensure  that  (15)  holds 
by  the  choice  of  X. 

Next  assume  a  constraint  of  the  general  form  (10),  (8),  with  a  symmetric, 
positive-definite  matrix  M.  Then  there  is  a  matrix  A  not  in.  general  unique,  such 
thatA'A=M  Now 


v  ]&v  -  v-ti  An  =  (Aw)-(Au) 


and  so  the  constraint  assumes  the  form 


VurZ  ^  E2, 
r 

where 

Ur  =  L-^vs 

« 


(IT) 


3 


In  view  of  (4) 

we  may  rewrite  (3)  as 

Vr  =  f'\fr  'ZJ  ~  f  r)q  '  -  )  dz 

and  obtain 

“t  =  /£•■*»»  (/•'*)  -f3)q[x)dx 

9 

Defining 

gr{x)  =  Ya „(/,(*)  -/,). 

9 

(13) 

we  obtain 

Y  [fgr(x)q[z)£c  ]  £a 

(19) 

from  (17).  Thus  constraints  of  the  general  form  (10)  can  be  transformed  to  (19), 
which  is  of  the  same  form  as  (12). 

We  note  that  (14)  is  identical  to  (3):  the  functional  form  of  the  solution  with 
uncertain  constraints  is  the  same  as  that  for  exact  constraints.  The  difference 
is  that,  for  uncertain  constraints,  the  conditions  that  determine  the  fir  have  the 
general  form  (16).  These  conditions  reduce  to  the  exact-constraint  case  for 
t  =  0.  One  way  of  viewing  this  identity  of  form  for  the  solutions  of  the  two  prob¬ 
lems  is  to  note  that  every  solution  q  of  an  uncertain-constraint  problem  is 
simultaneously  a  solution  of  an  exact-constraint  problem  with  the  same  func¬ 
tions  Jk  and  appropriately  modified  values  for  the  fk. 

The  relative  entropy  at  the  minimum  may  be  computed  by  substituting  (14) 
into  (2),  which  leads  to 

H{q.p)  =  -a  -£prffrq(z)dx.  (20) 

In  the  case  of  non-zero  expected  values,  JT* 0,  (16)  becomes 

STpT  =  / /r(*)7(*)d*  ~Jr-  (21) 

(For  simplicity  we  take  M  to  be  the  identity  )  Substituting  (21)  into  (20)  yields 

H{q,p)  =  -a  -  ZPrfr  s'\P'\.  (22) 


which  is  the  generalization  of  (5)  in  the  case  of  uncertain  constraints.  The  nor¬ 
malization  multiplier  a  has  the  same  functional  form  as  in  the  exact-constraint 
case  (6);  the  generalization  of  (7)  therefore  results  from  differentiating  (6), 
which  yields 


~  jjjh  =  f  fr[x)q(x)dx, 


and  then  substituting  (21),  which  yields 


Note  that  (22)  and  (23)  reduce  respectively  to  (5)  and  (7)  when  s  =  0 


(23) 


III.  APPLICATION  TO  SPECTRUM  ANALY 


Relative-Entropy  Spectrum  .Analysis  VRE3A)  is  an  extension  of  Burg  s 
Maximum-Entropy  Spectrai  Analysis  ,  VESA.)  9,  10]  chat  was  introduced  by  Shore 
]ll].  Like  MESA,  it  estimates  a  spectr'um  from  values  of  the  autocorrelation 
function.  RESA,  however  also  takes  into  account  prior  information  in  the  form 
of  an  initial  estimate  of  the  spectrum  Multisignal  RESA  (MRESA),  introduced  by 
Shore  and  Johnson  ]12],  simultaneously  estimates  the  power  spectra  of  several 
signals  when  an  initial  estimate  for  each  spectrum  is  available  and  new  informa¬ 
tion  is  obtained  in  the  form  of  values  of  the  autocorrelation  function  of  the  sum. 
The  resulting  final  estimates  are  the  solution  of  a  constrained  minimization 
problem'  they  are  consistent  with  the  autocorrelation  information  and  otherwise 
as  similar  as  possible  to  the  respective  initial  estimates  m  a  precisely  defined 
information-theoretic  sense.  MRESA  has  recently  been  extended  by  Johnson. 
Shore,  and  Burg  to  incorporate  weighting  factors  associated  with  each  initial 
spectrum  estimate  to  allow  for  the  fact  that  initial  estimates  may  not  be  'uni¬ 
formly  reliable  *13]. 

The  autocorrelation  values  were  treated  in  *11,  12,  13]  as  exactly  given. 
Usually,  however,  these  are  estimated  or  measured  values  subject  to  error  3v 
basing  a  derivation  on  the  REP  with  uncertain  constraints,  we  will  show  how  to 
incorporate  an  error  bound  to  allow  for  uncertainty  in  autocorrelation  values. 

MRESA  assumes  the  existence  of  L  independent  signals  with  power  spectra 
Si{f )  and  autocorrelations 

Rr  ~  f  Cr(f  )Si{f  )  df  .  (24) 

where 


Crifi)  =  cos  2 ntrf 


(25) 


Given  initial  estimates  Pv (/ )  of  the  power  spectrum  of  each  signal  5. ,  and  auto¬ 
correlation  measurements  on  the  sum  of  the  signals,  MRESA  provides  final  esti¬ 
mates  for  the  St  In  particular,  if  the  measurements  Rfi11  satisfy 

/?rt0t=  £  fCr(f)Qi(f)df,  (26) 

»=1 


for  lags  r  =CJ, 


M ,  the  resulting  final  estimates  are 

Qiif)  =  — : - 1 - . 

pd7T?^(/) 


(27) 


where  the  pr  are  chosen  so  that  the  Qi  satisfy  the  autocorrelation  constraints 
(28)  [12].  Since  some  initial  estimates  may  be  more  reliable  than  others,  these 
results  have  been  extended  recently  to  include  a  frequency-dependent  weight 
uij(/)  for  each  initial  estimate  Pfif)  ’13].  The  larger  the  value  of  the 

more  reliable  the  initial  estimate  Pfi\f)  is  considered  to  be.  With  the  weights 
included,  the  result  (27)  becomes 


3efore  generalizing  M3ESA  to  include  uncertain  constraints,  we  review  here 
seme  notation  and  results  from  '.2]  and  _14j  In  12]  for  each  of  the  L  signals 
we  used  a  discrete-spectrum  approximation 

A 

sx{i':  =  ^  ,1*  ccs  2rrfki  4 -6*  sin  2~fc  t ) 


[i  =  1,  ,  L)  '-nth  nonzero  frequencies  /fc,  not  necessarily  uniformly  spaced. 

The  and  blJe  were  random  variables  -with  independent,  zero-mean,  Gaussian 
initial  distributions.  We  defined  random  variables 


2*  =  i-6i) 


(29) 


representing  the  power  of  process  at  frequency  fk,  and  we  described  the  col¬ 
lection  of  signals  in  terms  of  their  joint  probability  density  qr{x).  where 
x  =  ;xj,  .  .  .  xi)  and  x^  -  -x^y.  .  ,  z.j \)  We  expressed  the  power  spectrum  5  as 
an  expectation 

Siift)  -  J'x.jcqr{ x)dx.  (30) 


In  terms  of  initial  estimates  P&  =  P-Aj*)  of  SAfk).  we  wrote  initial  esti¬ 
mates  p  of  q 7  in  the  form 


(X)  =  ^ 


1  =  1  Je  =  1 


wnere 


P* (**)  =  -p-  exp 


-tip/ 

(31) 

—Z-Jc 

P* 

cv 

CO 

The  assumed  Gaussian  form  of  the  initial  distribution  of  a*  and  8*  is  equivalent 
to  this  exponential  form  for  Pi*(2.j.);  the  coefficients  were  chosen  to  make  the 
expectation  of  x*  equal  to  P&.  Using  (30),  we  wrote  a  discrete-frequency  form 
of  (26)  as  'linear  constraints 

#°l  =  t,  £  /Cr*2u9r(x)^X  (33) 

<  =  1  Je  =  I 

on  expectation  values  of  q\  where 

=r*  =  Cr(fk). 


We  obtained  a  final  estimate  q  of  qr  by  minimizing  the  relative  entropy 

=  /?(*) 

subject  to  the  constraints  ((33)  with  q  in  place  of  q7)  and  the  normalization  con¬ 
dition 

Jq{x)dx~  1: 


the  resuit  had  the  form 

<?(x)  =  n  ?*(**)• 

i=l *=l 

where  the  q&  were  related  to  the  final  estimates 

Q x*  =  &(/*)  =  /x*?(x)dx 
of  the  power  spectra  of  the  s\  by 


(34) 


exp 


6 


(35) 


(36) 


This  led  to  a  discrete-frequency  version  of  (27) 


C-rt 


r  =  C 


where  the  $r  dad  to  be  chosen  so  that 

£  £  Crt  Q*  = 

i=[ *  =  I 

was  satisfied. 

To  handle  uncertain  constraints,  we  first  repiace  (26)  with  a  bound 

l2 


yfcr[f)Qi{f)df  -RrM\ 

1  i 


or  the  Euclidean  norm  of  the  error  vector  v  given  by 

Vr  =  £/ Cr'J)Ql'J)df  -  Rr^ 

i 

We  write  a  discrete- frequency  form  of  (37)  in  terms  of  expec*' 

12 


(37) 


(38) 


tq 


£  £  £  /cr*z*3  (*)  dx-  R, 


tot  I 

r  i 


r=0  li  =  l  ic  =  l 


This  has  the  form  (27);  by  (14),  minimizing  relative  entropy  su,.,-ct  to  these  con¬ 
straints  gives 


- « -  £  fir  £  £  o, 


q{x)  =p{x)  exp 

1  r=0  1  =  1  k  =  l 

where  the  0r  are  to  be  determined  so  that 

e'TgT  =  £  T,  f  Crkx*q  (x)dx  -  R}Qt 


i  =  l  Je  =  ! 


(39) 


(c/.  (16)).  Using  (32),  we  find  that  q  has  the  form  (34),  where  q nds*)  is  propor¬ 
tional  to 


exp 


^ar-tert  f 


q  L-t  *- 1  ^  -r*-»  I- 

V*  r-0  i=l  Ar  =  l  J 

Consequently  g*  is  given  by  (35)  where  Q *  is  given  by  (36).  Rewriting  (39)  in 
terms  of  and  passing  from  discrete  to  continuous  frequencies  gives 

W)  = 


Pdf)  r 

where  the  /Jr  are  to  be  determined  so  that 


+  rprCrif) 


^TT  =  £fCri/)W)*f  ~  Pr°X 


(40) 


(41) 


i  =  I 


The  functional  form  (40)  of  the  solution  with  uncertain  constraints  is  the  same 
as  the  form  (27)  for  exact  constraints;  the  difference  is  m  the  conditions  that 
determine  the  j3r.  (28)  for  exact  constraints  and  (41)  for  uncertain  constraints. 
This  is  a  consequence  of  the  analogous  result  for  probability-density  estimation, 
noted  in  the  previous  section. 
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In  the  case  of  the  more  general  constraint  form 

^MrsVrVs  ^  S2 

rs 

with  the  error  vector  v  as  m  (38),  Lt  is  convenient  to  carry  the  matrix  thrcugn 
the  derivation  rather  than  transforming  the  constraint  functions  as  .n  (15).  The 
result  is  that  the  final  estimates  again  have  the  form  '27),  while  tne  conditions 
;41)  on  the  j3r  are  replaced  by 

e =  if -  *■“  ■«; 

where 

/3'  = 

In  the  uncertain-constraint  case,  when  we  include  weignts  wx\f)  as  in  13],  the 
functional  form  of  the  solution  becomes  generalized  to  128);  the  conditions  that 
determine  the  j3r.  (41)  or  (42),  remain  the  same. 


IV  EXAMPLE 


We  shall  use  a  numerical  example  from  12.  13]  We  define  a  pair  of  spectra, 
5 s  and  55,  which  we  think  of  as  a  known  background"  component  and  an 
unknown  ‘'signal"  component  of  a  total  spectrum.  Both  are  symmetric  and 
defined  in  the  frequency  band  from  -0.5  to  -1-0.5.  though  we  plot  only  their 
positive-frequency  parts.  S3  is  the  sum  of  white  noise  with  total  power  5  and  a 
peak  at  frequency  0.215  corresponding  to  a  single  sinusoid  with  total  power  2 
Ss  consists  of  a  peak  at  frequency  0  165  corresponding  to  a  sinusoid  of  total 
power  2.  Figure  1  shows  a  discrete-frequency  approximation  to  the  sum  Ss+Sg 
using  100  equispaced  frequencies.  From  the  sum,  six  autocorrelation  were  com¬ 
puted  exactly  S3  hself  was  used  as  the  initial  estimate  Pg  of  S3  — ie,  P3  was 
Figure  1  without  the  left-hand  peak.  For  P$  we  used  a  uniform  (flat)  spectrum 
with  the  same  total  power  as  P3  Figure  2  shews  unweighted  multisignal  RESA 
final  estimates  Q3  and  Qs  1 12].  The  si  final  peak  shows  up  primarily  in  Qs.  but 
some  evidence  of  it  is  in  Qg  as  well.  This  is  reasonable  since  P3,  although 
exactly  correct,  is  treated  as  an  initial  estimate  subject  to  change  by  the  data. 
The  signal  peak  can.  be  suppressed  from  Qg  and  enhanced  in  Qg  by  weighting  the 
background  estimate  P3  heavily  ”  1 3]. 

In  Figure  3  we  show  final  estimates  for  uncertain  constraints  with  an  error 
bound  of  s  —  1  The  Euclidean  distance  (i  e  ,  a  constraint  of  the  form  (37))  was 
used.  The  estimates  were  obtained  with  Newton-Raphson  algorithms  similar  to 
those  developed  by  Johnson  ]15].  Both  final  estimates  m  Figure  3  are  closer  to 
the  corresponding  initial  estimates  than  is  the  case  m  Figure  2,  since  the  sum  of 
the  final  estimates  is  no  longer  constrained  to  satisfy  the  autocorrelations.  Fig¬ 
ure  4  shows  results  for  £=3;  the  final  estimates  are  even  closer  to  the  initial  esti¬ 
mates.  Because  the  example  was  constructed  'with  exactly  known  autocorrela¬ 
tions,  it  is  not  surprising  that  that  the  exactly  constrained  finai  estimates  are 
better  than  those  in  Figures  3  and  4  which  illustrate  the  more  conservative 
deviation  from  initial  estimates  that  results  from  incorporating  the  uncertain 
constraints. 
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V  DISCUSSION 


A  pleasant  property  of  the  new  estimator,  both  m  its  general  probability- 
density  form  and  in  the  power-spectrum  form,  is  that  it  has  the  same  functional 
form  as  that  for  exact  constraints.  In  the  case  of  the  power  spectrum  estima¬ 
tor,  this  means  that  resulting  final  estimates  are  still  ail-pole  spectra  whenever 
the  initial  estimates  are  ail-pole  and  the  weights  are  frequency-independent. 

It  appears  that  Abies  was  the  first  to  suggest  using  an  ’uncertain  constraint 
of  the  Euclidean  form  (37)  in  MESA  (16]  The  use  of  this  and  a  weighted 
Euclidean  constraint  m  MESA  was  studied  by  Newman  '17.  1 3]  This  corresponds 
to  a  diagonal  matrix  M  m  (1 1)  The  generalization  to  general  matrix  constraints 
has  been  studied  by  Schott  and  McClellan  [  1 9 J.  who  offer  advice  on  how  :o 
choose  M  appropriately  The  results  presented  herein  differ  m  two  maun 
respects:  treatment  of  the  muitisignal  case  and  Inclusion  of  initial  estimates 
Uncertain  constraints  have  also  been  used  in  applying  maximum  entropy  to 
image  processing  (20,21],  although  with  a  different  entropy  expression  (22] 
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